Automatically-extracted Thesauri for Cross-language Ir: When Better Is Worse
نویسنده
چکیده
A statistical algorithm for extracting bilingual term dictionaries (thesauri) from parallel text is presented, along with reenements for improving their size and accuracy. Somewhat paradoxically , increasing the accuracy of the extracted thesaurus can in fact reduce the performance of an IR system using it to perform query translation for cross-language information retrieval.
منابع مشابه
Automatic processing of multilingual medical terminology: applications to thesaurus enrichment and cross-language information retrieval
OBJECTIVES We present in this article experiments on multi-language information extraction and access in the medical domain. For such applications, multilingual terminology plays a crucial role when working on specialized languages and specific domains. MATERIAL AND METHODS We propose firstly a method for enriching multilingual thesauri which extracts new terms from parallel corpora, and seco...
متن کاملSimilarity Thesauri and Cross-Language Retrieval
This paper describes a method for constructing a thesaurus automatically from a corpus of suitable documents, using standard information retrieval methods. The resulting thesauri can be used for user-initiated query expansion, automatic query expansion, as well as cross-language retrieval. Researchers at the Swiss Federal Institute of Technology in Zürich developed and evaluated this method in ...
متن کاملMachine Generation of Thesauri: Adapting to Evolving Vocabularies in Design Documentation
A new breed of engineering design tools are electronic design notebooks, which are electronic versions of the traditional engineer’s logbook. They capture design information as it is generated, providing a rich, unfiltered history of a design project. This presents great potential for accessing past design decisions and rationale. This paper examines ways of searching for design information by ...
متن کاملAn Association Thesaurus for Information Retrieval
Although commonly used in both commercial and experimental information retrieval systems, thesauri have not demonstrated consistent beneets for retrieval performance, and it is diicult to construct a thesaurus automatically for large text databases. In this paper, an approach, called PhraseFinder, is proposed to construct collection-dependent association thesauri automatically using large full-...
متن کاملInternational Conference on Engineering Design
A new breed of engineering design tools are electronic design notebooks, which are electronic versions of the traditional engineer’s logbook. They capture design information as it is generated, providing a rich, unfiltered history of a design project. This presents great potential for accessing past design decisions and rationale. This paper examines ways of searching for design information by ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998